Or, alternatively, the exact code taken from the book (as of Feb. 12 2023):
RV: random variable
Bernoulli distribution: univariate binomial
PMF: probability mass function (binomial)
CMF: cumulative mass function (binomial)
standard normal distribution: mean = 0, sd = 1
f = PDF: probability density function (continuous)
CDF: cumulative density function (continuous)
k: number of successes
n: number of trials
\(\theta\): the probability of success
AUC: area under the curve
univariate: one variable
bivariate: 2 variables
multivariate: 2+ variables
variance = sd^2
There’s a set of events that can happen, like tossing a coin. A random variable maps each of these possible events to a real number. In tossing a coin until you get a heads, a random variable would be the number of tosses until you get a ‘success’ (heads). In principle it could be an infinite set, bcause you could in theory toss the coin an infinite number of times without ever getting a heads (though this is extremely improbable).
Alternatively, the random variable can be a finite set if we are interested in looking whether one toss results in a head or a tails.
PMF: probability mass function, PDF: probability density function
Simulate tossing a coin 10 times with probability of heads = 0.5, with a Bernoulli random variable.
# simulate tossing a coin 10 times
extraDistr::rbern(n = 10, prob = .5)
## [1] 1 0 1 0 0 1 0 1 1 0
What’s the probablity of getting a tails or a heads? The d-family of functions:
extraDistr::dbern(0,prob=.5)
## [1] 0.5
extraDistr::dbern(1,prob=.5)
## [1] 0.5
Cumulative probability distribution function: the p-family of functions
# probability distribution function
## for whichever case we coded as 0
extraDistr::pbern(0,.5)
## [1] 0.5
## for whichever case we coded as 1
extraDistr::pbern(1,.5)
## [1] 1
When we toss a coin only once, this is a Bernoulli random variable. If we toss the coin more than once, this is a binomial. Both Bernoulli and Binomial have a PMF associated with them.
# generate random binomial data
rbinom(n = 10, size = 1, prob = .5)
## [1] 1 1 1 0 1 0 0 1 0 0
# compute probabilites of a particular outcome
probs <- round(dbinom(0:10, size = 10, prob = .5),3); probs
## [1] 0.001 0.010 0.044 0.117 0.205 0.246 0.205 0.117 0.044 0.010 0.001
x <- 0:10
rbind(x,probs)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## x 0.000 1.00 2.000 3.000 4.000 5.000 6.000 7.000 8.000 9.00 10.000
## probs 0.001 0.01 0.044 0.117 0.205 0.246 0.205 0.117 0.044 0.01 0.001
# compute cumulative probabilities of all possible outcomes
pbinom(0:10, size=10, prob = .5)
## [1] 0.0009765625 0.0107421875 0.0546875000 0.1718750000 0.3769531250
## [6] 0.6230468750 0.8281250000 0.9453125000 0.9892578125 0.9990234375
## [11] 1.0000000000
Compute quantiles using the inverse of the CDF: what is the quantile q such that the probability of X is greater than x?
# generate distribution (0:10 outcomes or less, for 10 repetitions, with probability .5)
probs <- pbinom(0:10,size = 10, prob = .5)
# compute the inverse CDF; qbinom takes as its input a probability of an outcome and outputs the inverse
qbinom(probs, size = 10, prob=.5)
## [1] 0 1 2 3 4 5 6 7 8 9 10
d-p-q-r family of functions
| RV | Function(arguments) | Outcome |
|---|---|---|
| Bernoulli | rbern(n,prob) |
generate random data |
dbern(x,prob) |
PMF: probability of outcome x | |
pbern(q,prob) |
CMF: cumulative PMF of <=q | |
| Binomial | rbinom(n, size, prob) |
generate random data with n = of successes |
dbinom(q,size,prob) |
PMF: probability of outcome x | |
pbinom(q,size,prob) |
CMF: cumulative PMF of <=q | |
qbinom(p,prob,size) |
inverse CMF: cumulative PMF of >=x | |
| normal | rnorm(n,mean,sd) |
generate random data |
dnorm(x,mean,sd) |
PDF: density of outcome x | |
pnorm(q,mean,sd) |
CDF: cumulative PDF of <=q | |
pnorm(q,mean,sd, lower.tail=F) |
inverse CDF: cumulative PDF of >=q | |
qnorm(p,prob,size) |
Compute quantiles corresponding to probabilities |
rnorm(n,mean,sd): Generate random data using
rnorm(n=5,
mean = 0,
sd = 1)
## [1] -1.7423762 1.5093688 0.5218357 1.3696954 0.3483563
pnorm(q,mean,sd): Compute probabilities using CDF
pnorm(q = 2,
mean = 0,
sd = 1)
## [1] 0.9772499
pnorm(q,mean,sd,lower.tail=F): Compute inverse
probabilities using CDF
pnorm(q = 2,
lower.tail=F,
mean = 0,
sd = 1) # don't look the left (equal to or less than q), but the right (equal or greater than q)
## [1] 0.02275013
qnorm(p,mean,sd)): Compute quantiles corresponding to
probabilities using the inverse of the CDF
qnorm(p = 0.9772499,
mean = 0,
sd = 1)
## [1] 2.000001
dnorm(x,mean,sd): Compute the probability
density for a quantile value
dnorm(x = 2,
mean = 0,
sd = 1)
## [1] 0.05399097
curve(dnorm(x,0,1), xlim=c(-3,3), main="Normal(0,1)",
ylab="density")
from.z <- -1
to.z <- 1
S.x <- c(from.z, seq(from.z, to.z, 0.01), to.z)
S.y <- c(0, dnorm(seq(from.z, to.z, 0.01)), 0)
polygon(S.x,S.y, col=rgb(1, 0, 0,0.3))
text(-2,0.15,pos=4,cex=1.5,paste("pnorm(1)-pnorm(-1)"))
arrows(x1=2,y1=0.3,x0=1,y0=dnorm(1),code = 1)
text(1.7,0.32,pos=4,cex=1.5,paste("dnorm(1)"))
points(1,dnorm(1))
#points(1,0)
arrows(x1=2,y1=0.1,x0=1,y0=0,code = 1)
text(1,0.12,pos=4,cex=1.5,paste("qnorm(0.841)"))
x<-rnorm(10)
points(x=x,y=rep(0,10),pch=17)
text(-3,0.05,pos=4,cex=1.5,paste("rnorm(10)"))
arrows(x1=-2.5,y1=0.03,x0=min(x),y0=0,code = 1)
Spoiler: it’s pretty much the mean/mode/median of normally distributed data (where the three would be the same value); gives us the most likely value that the parameter \(\theta\) has given the data
From my book notes from SMLP 2022:
In real exerimental situations we never know the true value of \(\theta\) (probability of an outcome), but it can be derived from the data: \(\theta\) hat = k/n, where k = number of observed successess, n = number of trials, and \(\theta\) hat = observed proportion of successes. \(\theta\) hat = maximum likelihood estimate of the true but unknown parameter \(\theta\). Basically, the mean of the binomial distribution. The variance can also be estimated by computing (n(\(\theta\)))(1 - \(\theta\)). These estimates can be be used for statistical inference.
From my class notes from SMLP 2022:
- common misunderstanding of the maximum likelihood estimate (MLE): it doesn’t represent the true value of \(\theta\), because it’s the MLE (best guess) for the data you have
- but the MLE will be closer to the true value of \(\theta\) as sample size increases
- e.g., the PMF (binomial) contains three terms:
- k: number of successes
- n: total number of trials
- \(\theta\): probability of success (can have values between 0 and 1)
- the PMF is a function of these parameters, based on the data (given our data we know the values of k and n so they are fixed quantities and no longer random), so \(\theta\) can be treated as a variable
the likelihood function is the PMF (or PDF) as a function of the parameters, rather than a function of the data
the MLE from a particular sample of data doesn’t necessarily give us an accurate estimate of the true value of \(\theta\) (because it’s just a sample, of course)
univariate distributions: only one variab le
bivariate or multivariable distributions: multiple variables
Bernoulli distribution: univariate binomial data
joint probability mass function (PMF): the joint distribution of multivariate binomial data
joint probability density function would be for continuous data
each RV has its own PMF that can be computed separately
you can also compute the conditional distribution, i.e., the probability of x given y (x|y) or of y given x (x|y)
only going to discuss bivariate distributions here cause they’re easier to conceptualize
we’d take into account the mean and sd of each variable, as well as the correlation between the two
in a bivariate case, the diagonals (from top left downward) of the VarCorr matrix will contain the variances of either RV
covariance = the correlation of the two RVs multiplied by each of their SDs: Cov(X,Y) = \(\rho\)XY\(\sigma\)X\(\sigma\)Y
this allows us to describe how these 2 RVs are jointly distributed
in the joint PDF: AUC sums to 1 but will be a 3D cone, rather than a 2D curve as in the univariate case
we can also compute the marginal distributions, just as in the bivariate discrete case
simulating data:
# generate simulated bivariate data
## define a VarCorr matrix, where rho = .6, variance
Sigma <- matrix(c(
5^2, 5 * 10 * .6, # variance = sd^2, covariance=sd*sd*rho (.6)
.5 * 10 * .6, 10^2 # covariance=sd*sd*rho (.6) * correlation (.6), variance = sd^2
),
byrow = F, ncol = 2
)
## generate data
u <- as.data.frame(MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = Sigma))
head(u, n=3)
## plot data
# plot(u)
ggplot2::ggplot(u, aes(x = V1, y = V2)) +
labs(title = "rho = .6") +
geom_point()
suppose you have 2 discrete events (A: the streets are wet, B: it is raining)
P(A|B) = P(A,B)/P(B) where P(B) > 0
P(A,B) = P(B,A)
P(B,A) = P(B|A)P(A) = P(A|B)P(B) = P(A,B)
rearranging the terms, we get Bayes’ rule:
but in real world we’re usually working with multivariate (and continuous) data
an example with a discrete RV: if n = 10 trials and k = 7 successes, and if we suppose there are three possible values of $theta$: .1,.5, and .9 and each has the probability of 1/3, the likelihood function would be:
sum(dbinom(x = 7, # k successes
size=10, # n trials
prob=c(0.1,0.5,0.9) # theta values
)
)/3 # divided by 3 because each theta has p = 1/3
## [1] 0.05819729
# a discrete example
dbinom(x = 46, # k successes
size = 100, # n trials
prob = .5 # theta
)
## [1] 0.0579584
shape1 and
shape2dbeta(x, #
shape1, # parameter a
shape2 # parameter b
)
# will spit out the *density* (y-axis value) at a certain point along the curve
dbeta(x = .5,
shape1=1,
shape2=1
)
## [1] 1
dbeta(x = .5,
shape1=3,
shape2=3
)
## [1] 1.875
dbeta(x = .5,
shape1=6,
shape2=6
)
## [1] 2.707031
plot(function(x)
dbeta(x,shape1=1,shape2=1), 0,1,
main = "Beta density",
ylab="density",xlab="theta",ylim=c(0,3))
text(.5,1.1,"a=1,b=1")
plot(function(x)
dbeta(x,shape1=3,shape2=3),0,1,add=TRUE)
text(.5,1.6,"a=3,b=3")
plot(function(x)
dbeta(x,shape1=6,shape2=6),0,1,add=TRUE)
text(.5,2.6,"a=6,b=6")
the higher the values of a and b, the tighter your priors
uninformative prior: a and b = 1 (flat line, all values are equally likely)
normalising the lieklihood allows us to visualize all three (prior, likelihood, posterior) in the same plot on the same scale
## observed values
x <- 46
n <- 100
## prior specification
a <- 210
b <- 21
binom_lh <- function(theta) {
dbinom(x=x,
size=n,
prob=theta)
}
## normalising constant
K <- 1/integrate(f = binom_lh, lower=0,upper=1)$value
binom_scaled_lh <- function(theta)K*binom_lh(theta)
# Plot normalized prior, posterior, and observed data
## Likelihood
p_beta <- ggplot(data = tibble(theta = c(0, 1)), aes(theta)) +
stat_function(
fun = dbeta,
args = list(shape1 = a,
shape2 = b),
aes(linetype = "Prior")
) +
ylab("density") +
stat_function(
fun = dbeta,
args = list(shape1 = x + a,
shape2 = n - x + b),
aes(linetype = "Posterior")
) +
stat_function(
fun = binom_scaled_lh,
aes(linetype = "Scaled likelihood")
) +
theme_bw() +
theme(legend.title = element_blank())
p_beta
thetas<-seq(0,1,length=100)
op<-par(mfrow=c(3,1))
## prior
plot(thetas,dbeta(thetas,shape1=a,shape2=b),type="l",
main="Prior",xlab="theta",ylab="density")
## lik
probs<-rep(NA,100)
for(i in 1:100){
probs[i]<-dbinom(47,100,thetas[i])
}
plot(thetas,probs,main="Likelihood of x|theta_j",type="l",xlab="theta",ylab="density")
## post
x<-seq(0,1,length=100)
a.star<- x + a
b.star<- n - x + b
plot(x,dbeta(x,shape1=a.star,shape2=b.star),type="l",
main="Posterior",xlab="theta",ylab="density")
shape and `rate``# simulate data
round(rgamma(
n=10,
shape=3,
rate=1),
2)
## [1] 3.92 2.20 5.53 0.92 0.86 3.60 0.39 4.29 5.00 4.26
# plot
x<-seq(0,10,by=0.01)
plot(x,dgamma(x,shape=3,rate=1),type="l",
ylab="density")
\(\frac{a}{b}\) = 3 \(\frac{a}{b^{2}}\) = 1.5 = \(\frac{\frac{a}{b}}{b}\) = \(\frac{3}{b}\) = 1.5 then \(\frac{3}{1.5}\) = \(b\) = 2
we’re now going to look at the Poisson-gamma case
if we have a Gamma prior with a Poisson likelihood
recall: the Poisson distribution is for discrete cases and has one parameter: the rate (e.g., number of regressions)
Take aways - when data are sparse, the prior will dominate in determining the posterior mean - when a lot of data are available, the MLE will dominate in determining the posterior mean - give sparse data, informative priors based on expert knowledge, existing data, or meta-analysis will play an important role
Quick review:
\[ P(A\mid B) = \frac{P(B\mid A) P(A)}{P(B)} \tag{2.1} \]
\[ p(\theta\mid y) = \frac{p(y\mid \theta) p(\theta)}{p(y)} \tag{2.1} \]
\(\theta\) can be a vector of parameters in multivariate cases; this makes it difficult to calculate a posterior distribution (joint distribution) on paper
the posterior distribution is of central interest to us
we’re really interested in the posterior distribution up to proportionatlity (without the )
imagine we know what the posterior distribution is
# 95% credible interval
qgamma(
c(.025,.975),
shape=20, # a
rate=7) # b
## [1] 1.745217 4.238693
# generate posterior samples from a gamma distribution: 4000 obvs
lambda_posterior <- rgamma(
40000, # n
shape = 20, # a
rate = 7 # b
)
# now compute CrI using samples from the posterior distribution
round(quantile(lambda_posterior, probs = c(.025,.975)),2)
## 2.5% 97.5%
## 1.75 4.24
this is obtaining samples for the posterior distributions
Bayesian analyses is really uncertainty quantification
these samples from the posterior distribution are what we can get through MCMC sampling
our main goal will always be to obtain this posterior distribution:
\[ p(\theta\mid{y}) \]
An example:
library(bcogsci)
data("df_spacebar")
head(df_spacebar,n=2)
ggplot(df_spacebar, aes(rt)) +
geom_density() +
xlab("response times") +
ggtitle("Button-press data")+theme_bw()
we see the peak is at about 180ms, and there’s a long tail
assume this statistical model (n: the n-th row in the data frame):
\[ rt_n \sim \mathit{Normal}(\mu, \sigma) end{equation} \]
\[ rt_n \sim \mathit{Normal}(\mu, \sigma) \]
\[ rt_n = \mu + \varepsilon \hbox{, where } \varepsilon_n \stackrel{iid}{\sim} \mathit{Normal}(0,\sigma) \]
so it is assumed that the residuals are normally distributed, and that there is some unknown variable \(\mu\) and some unknown parameter \(\sigma\), that represents the variability around \(\mu\), and that the noise (\(\varepsilon\)) is Gaussian (normally distributed)
in a Frequentist model, we’d run:
m <- lm(rt~1, df_spacebar)
coef(m) # intercept, i.e., the mean value of the data
## (Intercept)
## 168.6399
sigma(m) # residual error
## [1] 24.9118
# these are the same as mean and sd of RT
mean(df_spacebar$rt)
## [1] 168.6399
sd(df_spacebar$rt)
## [1] 24.9118
the intercept and residual error are just the maximum likelihood estimates from the data, i.e., the mean and sd
the linear model is pretty much giving you 2 MLE, one for the intercept and for the residual error
Frequentist: \(\mu\) and \(\sigma\) are fixed, unknown point values in frequentist models
in the Bayesian world, \(\mu\) and \(\sigma\) are random variables and need prior distributions specified for them
let’s start with (unrealistic) flat/uniform priors:
\[ \mu \sim Uniform(0,60000)\\ \sigma \sim Uniform(0,2000) \]
uniform priors mean that all the values between a and b are equally likely (hence the ‘flat’ label; it’s not a curve but a flat line)
you can choose whatever prior makes sense to you
the function brm has the same formula syntax as
lm
fit_press <- brm(
# model specification
rt ~ 1,
data = df_spacebar,
# the likelihood assumed:
family = gaussian(), # likelihood has normal distr.
# prior specifications:
prior = c(
prior(
uniform(0, 60000), # uniform/flat prior
class = Intercept, # mean
lb = 0, # lb/ub: technical details, truncate the sampling
ub = 60000
),
prior(uniform(0, 2000), # uniform/flat prior
class = sigma, # sd
lb = 0,
ub = 2000)
),
# sampling specifications
# Technical stuff
# backend = "cmdstanr",
chains = 4, # this is the default
iter = 2000, # this is the default
warmup = 1000 # this is the default
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
chains: the number of independent runs for sampling (by default 4)
iter: the number of iterations that the sampler makes to sample from the posterior distribution of each paramter (by default 2000)
warmup: the number of iterations from the start of sampling that are eventually discarded (by default half of ‘iter’) (in WinBUGS/JAGS this is called burn-in)
relevant reading: sampling and convergence in a nutshell in Ch. 3
plot(fit_press)
we get a visual graphic of how the sample was searching in the posterior distribution
the caterpillar plot shows us the chains sitting on top of each other; a fat hairy caterpillar means the chains are all consistantly sampling from the same distribution, this is good
if we don’t see a afat hairy caterpillar, this indicates a convergence failure (if the chains are not really aligning)
also try using shinystan:
library(shinystan)
launch_shinystan(fit_press)
# extract the posterior distributions as vectors
as_draws_df(fit_press) %>%
head(3)
# compute mean
as_draws_df(fit_press)$b_Intercept %>%
mean()
## [1] 168.6372
# compute 95% CrI for the mean
as_draws_df(fit_press)$b_Intercept %>%
quantile(c(.025,.975))
## 2.5% 97.5%
## 166.0262 171.1923
# compute sd
as_draws_df(fit_press)$sigma %>%
mean()
## [1] 24.99241
# computer 95% CrI for sigma
as_draws_df(fit_press)$sigma %>%
quantile(c(.025,.975))
## 2.5% 97.5%
## 23.22645 27.00463
The model specification, again:
\[ \mu \sim Uniform(0,6000)\\ \sigma \sim Uniform(0,2000)\\ rt_{n} \sim Normal(\mu, \sigma) \]
we can generate the prior predictive distirubiton given the model above (\(\theta =< \mu,\sigma>\)) by integrating out the theta parameters
to do this, do the following many times:
mu <- runif(1, min=0, max=60000)
sigma <- runif(1, 0, 2000)
y_pred_1 <- rnorm(n=5, mu, sigma)
y_pred_1
## [1] 39843.65 39766.19 39666.18 39829.95 39686.40
each sample is an imaginary or portential data set
in the textbook, there’s code for generating prior predictive data using R
what are our options regarding the priors?
in the book, there are some classifications of priors (\(Normal+\) indicates normal distribution truncated at 0, i.e., no negative values):
\[ \begin{align} \mu &\sim Uniform(-10^6,10^6)\\ \sigma &\sim Uniform(0,10^6)\\ rt_{n} &\sim Normal(\mu, \sigma) \end{align} \]
# new model with uninformative priors
fit_press_unif <- brm(
# model specification
rt ~ 1,
data = df_spacebar,
# the likelihood assumed:
family = gaussian(), # likelihood has normal distr.
# prior specifications:
prior = c(
prior(
uniform(-10^6, 10^6), # uniform/flat prior
class = Intercept, # mean
lb = -10^6,
ub = 10^6
),
prior(uniform(0, 10^6), # uniform/flat prior
class = sigma, # sd
lb = 0,
ub = 10^6
)
),
# sampling specifications
# Technical stuff
# backend = "cmdstanr",
chains = 4, # this is the default
iter = 2000, # this is the default
warmup = 1000 # this is the default
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
# earlier model
as_draws_df(fit_press)$b_Intercept %>%
quantile(c(0.025, 0.975))
## 2.5% 97.5%
## 166.0262 171.1923
# new flat/uninformative model
as_draws_df(fit_press_unif)$b_Intercept %>%
quantile(c(0.025, 0.975))
## 2.5% 97.5%
## 166.0792 171.2173
\[ \mu \sim Normal(400,10) \sigma \sim Normal_{+}(100,10) rt_{n} \sim Normal(\mu, \sigma) \]
# new model with uninformative priors
fit_press_inf <- brm(
# model specification
rt ~ 1,
data = df_spacebar,
# the likelihood assumed:
family = gaussian(), # likelihood has normal distr.
# prior specifications:
prior = c(
prior(
normal(400, 10),
class = Intercept),
# brms knows that SDs need to be bounded
# to exclude values below zero:
prior(normal(100,10), # informative
class = sigma # sd
)
),
# sampling specifications
# Technical stuff
# backend = "cmdstanr",
chains = 4, # this is the default
iter = 2000, # this is the default
warmup = 1000 # this is the default
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
# new flat/uninformative model
as_draws_df(fit_press_inf)$b_Intercept %>%
quantile(c(0.025, 0.975))
## 2.5% 97.5%
## 170.1966 175.7443
these posteriors are shifted a bit to the right (bigger) because the prior was very tight, but not by much
and with a principled prior:
\[ \mu \sim Normal(200,100) \sigma \sim Normal_{+}(50,50) rt_{n} \sim Normal(\mu, \sigma) \]
fit_press_prin <- brm(rt ~ 1,
data = df_spacebar,
family = gaussian(),
prior = c(
prior(normal(200, 100), class = Intercept),
# brms knows that SDs need to be bounded
# to exclude values below zero:
prior(normal(50, 50), class = sigma)
),
# sampling specifications
# Technical stuff
# backend = "cmdstanr",
chains = 4, # this is the default
iter = 2000, # this is the default
warmup = 1000 # this is the default
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
# new principled model
as_draws_df(fit_press_prin)$b_Intercept %>%
quantile(c(0.025, 0.975))
## 2.5% 97.5%
## 166.1515 171.2139
\[ \begin{align} log(y) &\sim Normal(\mu,\sigma)\\ y &\sim LogNormal(\mu,\sigma) \end{align} \]
mu <- 6
sigma <- 0.5
N <- 500000
# generate N random smaples from log-normal distribution
sl <- rlnorm(N, mu, sigma)
ggplot(tibble(samples = sl), aes(samples)) +
geom_histogram(aes(y = ..density..), binwidth = 50) +
ggtitle("Log-normal distribution\n") +
coord_cartesian(xlim = c(0, 2000))
\[ rt_n \sim LogNormal(\mu,\sigma) \]
\[ \begin{align} \mu &\sim Uniform(0,11)\\ \sigma &\sim Uniform(0,1) \end{align} \] - generate prior predictive distributions:
normal_predictive_distribution <-
function(mu_samples, sigma_samples, N_obs) {
# empty data frame with headers:
df_pred <- tibble(
trialn = numeric(0),
rt_pred = numeric(0),
iter = numeric(0)
)
# i iterates from 1 to the length of mu_samples,
# which we assume is identical to
# the length of the sigma_samples:
for (i in seq_along(mu_samples)) {
mu <- mu_samples[i]
sigma <- sigma_samples[i]
df_pred <- bind_rows(
df_pred,
tibble(
trialn = seq_len(N_obs), # 1, 2,... N_obs
rt_pred = rnorm(N_obs, mu, sigma),
iter = i
)
)
}
df_pred
}
N_samples <- 1000
N_obs <- nrow(df_spacebar)
mu_samples <- runif(N_samples, 0, 11)
sigma_samples <- runif(N_samples, 0, 1)
prior_pred_ln <- normal_predictive_distribution(
mu_samples = mu_samples,
sigma_samples = sigma_samples,
N_obs = N_obs
) %>%
mutate(rt_pred = exp(rt_pred))
prior_pred_ln %>%
group_by(iter) %>%
summarize(
min_rt = min(rt_pred),
max_rt = max(rt_pred),
mean_rt = mean(rt_pred),
median_rt = median(rt_pred)
) %>%
pivot_longer(cols = ends_with("rt"), names_to = "stat", values_to = "rt") %>%
mutate(stat = factor(stat, levels = c("mean_rt", "median_rt", "min_rt", "max_rt"))) %>%
ggplot(aes(rt)) +
scale_x_continuous("Response times in ms",
trans = "log", breaks = c(0.001, 1, 10, 100, 1000, 10000, 100000)
) +
geom_histogram(aes(y = ..density..)) +
ylab("density") +
facet_wrap(~stat, ncol = 1)+theme_bw()
we know they’re ridiculous because of our prior knowledge about reaction times
more informative priors:
\[
\begin{align}
\mu &\sim Normal(6,1.5)\\
\sigma &\sim Normal_+(0,1)
\end{align}
\] - and update our model + brms knows sd is
truncated at 0 so we don’t have to specify lb and
ub here; if you were writing directly in Stan code you’d
have to specify it yourself
df_spacebar_ref <- df_spacebar %>%
mutate(rt = rep(1, n()))
fit_prior_press_ln <- brm(rt ~ 1,
data = df_spacebar_ref,
family = lognormal(), # log instead of gaussian
prior = c(
prior(normal(6, 1.5), class = Intercept), # on the log scale
prior(normal(0, 1), class = sigma)
),
sample_prior = "only", # produce prior samples
control = list(adapt_delta = .9) # this was added for convergence reasons
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
pp_check(fit_prior_press_ln, type = "stat", stat = "mean",
prefix = "ppd") +
coord_cartesian(xlim = c(0.001, 300000)) +
scale_x_continuous("Response times [ms]",
trans = "log",
breaks = c(0.001, 1, 100, 1000, 10000, 100000),
labels = c(
"0.001", "1", "100", "1000", "10000",
"100000"
)
) +
ggtitle("Prior predictive distribution of means")
p1 <- pp_check(fit_prior_press_ln, type = "stat", stat = "mean", prefix = "ppd") +
coord_cartesian(xlim = c(0.001, 300000)) +
scale_x_continuous("Response times [ms]",
trans = "log",
breaks = c(0.001, 1, 100, 1000, 10000, 100000),
labels = c(
"0.001", "1", "100", "1000", "10000",
"100000"
)
) +
ggtitle("Prior predictive distribution of means")
p2 <- pp_check(fit_prior_press_ln, type = "stat", stat = "min", prefix = "ppd") +
coord_cartesian(xlim = c(0.001, 300000)) +
scale_x_continuous("Response times [ms]",
trans = "log",
breaks = c(0.001, 1, 100, 1000, 10000, 100000),
labels = c(
"0.001", "1", "100", "1000", "10000",
"100000"
)
) +
ggtitle("Prior predictive distribution of minimum values")
p3 <- pp_check(fit_prior_press_ln, type = "stat", stat = "max", prefix = "ppd") +
coord_cartesian(xlim = c(0.001, 300000)) +
scale_x_continuous("Response times [ms]",
trans = "log",
breaks = c(0.001, 1, 100, 1000, 10000, 100000),
labels = c(
"0.001", "1", "10", "1000", "10000",
"100000"
)
) +
ggtitle("Prior predictive distribution of maximum values")
cowplot::plot_grid(p1, p2, p3, nrow = 3, ncol =1)
fit_press_ln <- brm(rt ~ 1,
data = df_spacebar,
family = lognormal(),
prior = c(
prior(normal(6, 1.5), class = Intercept),
prior(normal(0, 1), class = sigma)
)
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
fit_press_ln
## Family: lognormal
## Links: mu = identity; sigma = identity
## Formula: rt ~ 1
## Data: df_spacebar (Number of observations: 361)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 5.12 0.01 5.10 5.13 1.00 3730 2727
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.13 0.01 0.13 0.14 1.00 3162 2523
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
now we’ll evaluate the log-normal model with these relatively informative priors
back-transforming to ms:
estimate_ms <- exp(as_draws_df(fit_press_ln)$b_Intercept)
round(c(mean = mean(estimate_ms),
sd = sd(estimate_ms),
quantile(estimate_ms,
probs = c(.025,.975))),
1)
## mean sd 2.5% 97.5%
## 167.1 1.2 164.7 169.5
these credible intervals are conditioned on the data we happened to get, they don’t reflect any true mean necessarily
we can also do a posterior predictive check:
pp_check(fit_press_ln, ndraws = 100)
fit_press_norm <- brm(rt ~ 1,
data = df_spacebar,
family = gaussian(),
prior = c(
prior(uniform(0, 60000), class = Intercept, lb = 0,
ub = 60000),
prior(uniform(0, 2000), class = sigma, lb = 0,
ub = 2000)
),
chains = 4,
iter = 2000,
warmup = 1000
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
ggpubr::ggarrange(
pp_check(fit_press_norm,
type = "stat", stat = "min") +
ggtitle("Normal model (min)")+
theme(legend.position = "none"),
pp_check(fit_press_norm,
type = "stat", stat = "max") +
ggtitle("Normal model (max)")+
theme(legend.position = "none"),
pp_check(fit_press_ln,
type = "stat", stat = "min") +
ggtitle("Log model (min)")+
theme(legend.position = "none"),
pp_check(fit_press_ln,
type = "stat", stat = "max") +
ggtitle("Log model (max)") +
theme(legend.position = "none"),
cowplot::get_legend(pp_check(fit_press_norm) + theme(legend.position = "bottom")),
nrow = 3, ncol = 2,
heights = c(.45,.45,.1),
labels = c("A","B","C","D")
)
we saw 2 simple examples of a liner model, with two differet likelihoods (normal and log normal)
one key skill we learned was to examine and interpret the piror predictive distribution
another key skill: interpreting the posterior predictive distribution
these two distributions tell us how well themodel represents the realiy, both before and after observing the particular data we have
Next:
participant has to track between 0:5 objects on the screen, when there are several distractor objects on the screen as well
examining attentional load, and its effect on pupil size
our model:
\[ p\_size_n \sim Normal(\alpha + c_load_n \cdot \beta, \sigma) \]
data("df_pupil_pilot")
df_pupil_pilot$p_size %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 851.5 856.0 862.0 860.8 866.5 868.0
\[ \alpha \sim Normal(1000,500) \]
round(
qnorm(c(.025,.975),
mean = 1000,
sd = 500),
2)
## [1] 20.02 1979.98
\[ \sigma \sim Normal_+(0,1000) \]
round(extraDistr::qtnorm(
c(.025, .975),
mean = 0,
sd = 1000,
a = 0 # truncate at 0
),
1)
## [1] 31.3 2241.4
\[ \beta \sim Normal(0,100) \]
round(
qnorm(c(.025,.975),
mean = 0, sd = 100),
1)
## [1] -196 196
these are what we’re saying we think the plausible values of \(\beta\) will be
let’s now work with the ‘real’ data, and centre the predictor
we see this data is repeated measure, because we have multiple data points from each participant for each load (2 times load = 0, for example)
data("df_pupil")
df_pupil <- df_pupil %>%
mutate(c_load = load - mean(load)); head(df_pupil,7)
# only sigma is truncated at 0 in our priors, and it is also truncated at 0 by default in stan
# therefore, we don't need to set any ub/lb
fit_pupil <- brm(p_size ~ c_load,
data = df_pupil,
family = gaussian(),
prior = c(
prior(normal(1000,500), class = Intercept),
prior(normal(0,1000), class = sigma),
prior(normal(0,100), class = b, coef = c_load)
))
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
plot() function to check the convergence
b_c_load)plot(fit_pupil)
we see:
we see sigma is in the positive range
we also see the chains are on top of each other (and so are sampling from the same distribution)
# a function they wrote, will be shared with us
short_summary(fit_pupil)
for (l in 0:4) { # for the cognitive load levels 0:4
df_sub_pupil <- filter(df_pupil, load == l) # filter out other loads
d <- pp_check(fit_pupil, # run pp_check
type = "dens_overlay",
ndraws = 100,
newdata = df_sub_pupil # using this new data with filtering
) +
# and plot it
geom_point(data = df_sub_pupil, aes(x = p_size, y = 0.0001)) +
ggtitle(paste("load: ", l)) +
coord_cartesian(xlim = c(400, 1000))
# print(d)
# and store object as p_l
name <- paste0("dens_",l)
assign(name, d)
}
ggpubr::ggarrange(
dens_0 + theme(legend.position = "none"),
dens_1 + theme(legend.position = "none"),
dens_2 + theme(legend.position = "none"),
dens_3 + theme(legend.position = "none"),
dens_4 + theme(legend.position = "none"),
cowplot::get_legend(dens_4),
nrow=2, ncol = 3)
for (l in 0:4) {
df_sub_pupil <- filter(df_pupil, load == l)
p <- pp_check(fit_pupil,
type = "stat",
ndraws = 1000,
newdata = df_sub_pupil,
stat = "mean"
) +
geom_point(data = df_sub_pupil, aes(x = p_size, y = 0.1)) +
ggtitle(paste("load: ", l)) +
coord_cartesian(xlim = c(400, 1000))
# print(p)
# and store object as p_l
name <- paste0("p_",l)
assign(name, p)
}
ggpubr::ggarrange(
p_0 + theme(legend.position = "none"),
p_1 + theme(legend.position = "none"),
p_2 + theme(legend.position = "none"),
p_3 + theme(legend.position = "none"),
p_4 + theme(legend.position = "none"),
cowplot::get_legend(p_4),
nrow=2, ncol = 3)
trial as a predictor to see if there’s a practice
effect (speed up) or fatigue effet (slowdown)df_spacebar <- df_spacebar %>%
mutate(c_trial = trial - mean(trial))
\[ rt_n \sim LogNormal(\alpha + c_trial_n \cdot \beta, \sigma) \]
\(N\): total number of (independent) data points
\(n = 1,...,N\), and \(rt\): the dependent variable (RTs in milliseconds)
because we’re assuming a log normal likelihood, we now have to define priors on the \(\alpha\), \(\beta\), and \(\sigma\) parameters in the log scale
\[ \begin{align} \alpha &\sim Normal(6,1.5)\\ \sigma &\sim Normal_+(0,1) \end{align} \]
\[ \beta \sim Normal(0,1) \]
brm
function, but with:
family = lognormal() to set log normal
distributionsample_prior = "only" to not compute posteriors# Ignore the dependent variable,
# use a vector of ones to create dummy data
df_spacebar_ref <- df_spacebar %>%
mutate(t = rep(1, n()))
fit_prior_press_trial <- brm(t ~ 1 + c_trial,
data = df_spacebar_ref, # dummy data, cause we're looking at priors only
family = lognormal(), # log normal distribution
prior = c(
prior(normal(6, 1.5), class = Intercept),
prior(normal(0, 1), class = sigma),
prior(normal(0, 1), class = b, coef = c_trial)
),
sample_prior = "only", # PRIOR PRED DISTR
control = list(adapt_delta = .9)
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
# calculate median
median_diff <- function(x) {
median(x - lag(x), na.rm = TRUE)
}
# plot pp_check for priors
pp_check(fit_prior_press_trial,
type = "stat",
stat = "median_diff",
# show only prior predictive distributions
prefix = "ppd",
# each bin has a width of 500ms
binwidth = 500) +
# cut the top of the plot to improve its scale
coord_cartesian(ylim = c(0, 50))
\[ \beta \sim Normal(0,0.01) \] - try it again with these more informative priors
fit_prior_press_trial <- brm(t ~ 1 + c_trial,
data = df_spacebar_ref, # dummy data, cause we're looking at priors only
family = lognormal(), # log normal distribution
prior = c(
prior(normal(6, 1.5), class = Intercept),
prior(normal(0, 1), class = sigma),
prior(normal(0, .01), class = b, coef = c_trial)
),
sample_prior = "only", # PRIOR PRED DISTR
control = list(adapt_delta = .9)
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.1/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
# calculate median
median_diff <- function(x) {
median(x - lag(x), na.rm = TRUE)
}
# plot pp_check for priors
pp_check(fit_prior_press_trial,
type = "stat",
stat = "median_diff",
# show only prior predictive distributions
prefix = "ppd",
# each bin has a width of 500ms
binwidth = 500) +
# cut the top of the plot to improve its scale
coord_cartesian(ylim = c(0, 50))